Semi-Supervised Learning of the Electronic Health Record with Denoising Autoencoders for Phenotype Stratification
نویسندگان
چکیده
Patient interactions with health care providers result in entries to electronic health records (EHRs). EHRs were built for clinical and billing purposes but contain many data points about an individual. Mining these records provides opportunities to extract electronic phenotypes that can be paired with genetic data to identify genes underlying common human diseases. This task remains challenging: high quality phenotyping is costly and requires physician review; many fields in the records are sparsely filled; and our definitions of diseases are continuing to improve over time. Here we develop and evaluate a semi-supervised learning method for EHR phenotype extraction using denoising autoencoders for phenotype stratification. By combining denoising autoencoders with random forests we find classification improvements across simulation models, particularly in cases where only a small number of patients have high quality phenotype. This situation is commonly encountered in research with EHRs. Denoising autoencoders perform dimensionality reduction allowing visualization and clustering for the discovery of new subtypes of disease. This method represents a promising approach to clarify disease subtypes and improve genotype-phenotype association studies that leverage EHRs.
منابع مشابه
Semi-supervised learning of the electronic health record for phenotype stratification
Patient interactions with health care providers result in entries to electronic health records (EHRs). EHRs were built for clinical and billing purposes but contain many data points about an individual. Mining these records provides opportunities to extract electronic phenotypes, which can be paired with genetic data to identify genes underlying common human diseases. This task remains challeng...
متن کاملDeconstructing the Ladder Network Architecture
The manual labeling of data is and will remain a costly endeavor. For this reason, semi-supervised learning remains a topic of practical importance. The recently proposed Ladder Network is one such approach that has proven to be very successful. In addition to the supervised objective, the Ladder Network also adds an unsupervised objective corresponding to the reconstruction costs of a stack of...
متن کاملOnline Semi-Supervised Learning with Deep Hybrid Boltzmann Machines and Denoising Autoencoders
Two novel deep hybrid architectures, the Deep Hybrid Boltzmann Machine and the Deep Hybrid Denoising Auto-encoder, are proposed for handling semisupervised learning problems. The models combine experts that model relevant distributions at different levels of abstraction to improve overall predictive performance on discriminative tasks. Theoretical motivations and algorithms for joint learning f...
متن کاملDenoising Adversarial Autoencoders: Classifying Skin Lesions Using Limited Labelled Training Data
We propose a novel deep learning model for classifying medical images in the setting where there is a large amount of unlabelled medical data available, but labelled data is in limited supply. We consider the specific case of classifying skin lesions as either malignant or benign. In this setting, the proposed approach – the semi-supervised, denoising adversarial autoencoder – is able to utilis...
متن کاملSemi-supervised Learning using Denoising Autoencoders for Brain Lesion Detection and Segmentation
The work explores the use of denoising autoencoders (DAEs) for brain lesion detection, segmentation, and false-positive reduction. Stacked denoising autoencoders (SDAEs) were pretrained using a large number of unlabeled patient volumes and fine-tuned with patches drawn from a limited number of patients ([Formula: see text], 40, 65). The results show negligible loss in performance even when SDAE...
متن کامل